Skip to content

feat(Doclang): add content layer support#568

Merged
vagenas merged 2 commits intomainfrom
feature/doclang-content-layer
Mar 30, 2026
Merged

feat(Doclang): add content layer support#568
vagenas merged 2 commits intomainfrom
feature/doclang-content-layer

Conversation

@vagenas
Copy link
Copy Markdown
Member

@vagenas vagenas commented Mar 26, 2026

No description provided.

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 26, 2026

DCO Check Passed

Thanks @vagenas, all your commits are properly signed off. 🎉

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 26, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

🟢 Require two reviewer for test updates

Wonderful, this rule succeeded.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 26, 2026

Codecov Report

❌ Patch coverage is 86.84211% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
docling_core/experimental/doclang.py 86.84% 5 Missing ⚠️

📢 Thoughts on this report? Let us know!

@vagenas vagenas mentioned this pull request Mar 26, 2026
52 tasks
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
@vagenas vagenas requested a review from dolfim-ibm March 30, 2026 08:31
@vagenas vagenas merged commit fe9bbfb into main Mar 30, 2026
14 of 19 checks passed
@vagenas vagenas deleted the feature/doclang-content-layer branch March 30, 2026 08:40
@dosubot
Copy link
Copy Markdown

dosubot Bot commented Mar 30, 2026

Documentation Updates

1 document(s) were updated by changes in this PR:

Content Layers
View Changes
@@ -92,6 +92,47 @@
 
 The corresponding save methods (`save_as_text()`, `save_as_markdown()`, `save_as_html()`, `save_as_vtt()`) support the same parameters as their export counterparts, with one difference: `save_as_vtt()` defaults `omit_voice_end` to True (while `export_to_vtt()` defaults it to False) for more concise output files.
 
+### Doclang (Experimental) Serializer
+
+The experimental Doclang serializer (`docling_core/experimental/doclang.py`) also supports content layer filtering and annotation via `DoclangParams`.
+
+**Content layer filtering:** The `layers` field controls which content layers are serialized. It accepts a `set[ContentLayer]` and defaults to all content layers. To serialize only the body layer, for example:
+
+```python
+from docling_core.experimental.doclang import DoclangParams
+from docling_core.types.doc import ContentLayer
+
+params = DoclangParams(layers={ContentLayer.BODY})
+```
+
+**Layer annotation:** The `layer_mode` field of type `LayerMode` controls whether a `<layer class="..."/>` self-closing XML token is emitted for each item:
+- `LayerMode.MINIMAL` (default): emits `<layer class="..."/>` only when the item's content layer differs from `ContentLayer.BODY`.
+- `LayerMode.ALWAYS`: emits `<layer class="..."/>` for every item, regardless of its layer.
+
+```python
+from docling_core.experimental.doclang import DoclangParams, LayerMode
+from docling_core.types.doc import ContentLayer
+
+params = DoclangParams(
+    layers={ContentLayer.BODY, ContentLayer.FURNITURE},
+    layer_mode=LayerMode.MINIMAL,
+)
+```
+
+In the serialized XML output, content layer information appears as an embedded self-closing token, for example:
+
+```xml
+<page_header>
+  <layer class="furniture"/>
+  Page Header
+</page_header>
+<text>
+  Main body content
+</text>
+```
+
+With `LayerMode.ALWAYS`, `<layer class="body"/>` would also appear inside the `<text>` block above.
+
 ## Iterating Over Items Including Furniture
 
 For advanced use cases, such as iterating over document items including headers and footers, use the `iterate_items` method with the appropriate content layers:

How did I do? Any feedback?  Join Discord

ceberam pushed a commit to odelliab/docling-core that referenced this pull request Apr 9, 2026
* feat(Doclang): add content layer support

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* rename layer attribute

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

---------

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants